280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
نویسندگان
چکیده
We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the stateof-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.
منابع مشابه
Classifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment
In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a ...
متن کاملNeed to categorize: A comparative look at the categories of the Universal Decimal Classification system (UDC) and Wikipedia
This study analyzes the differences between the category structure of the Universal Decimal Classification (UDC) system (which is one of the widely used library classification systems in Europe) and Wikipedia. In particular, we compare the emerging structure of category-links to the structure of classes in the UDC. With this comparison we would like to scrutinize the question of how do knowledg...
متن کاملRapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based algorithm with a directed crawler. We exploit the multilingual open-content directory of the W...
متن کاملDocument Categorization using Multilingual Associative Networks based on Wikipedia
Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...
متن کاملCEA LIST's Participation at the CLEF CHiC 2013
For our first participation to the CLEF CHiC Lab, we submitted runs to the multilingual ad-hoc and multilingual semantic enrichment tasks. Given the strong multilingual character of the evaluation corpus, the main objectives of the experiments were to test the efficiency of semantic topic expansion and consolidation based on Explicit Semantic Analysis (ESA) versions in different languages. Anot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1704.07624 شماره
صفحات -
تاریخ انتشار 2017